Rule-based Measurement Of

نویسندگان

  • Jochen Hipp
  • Felix Naumann
چکیده

Abstract: Sufficiently high data quality is crucial for almost every application. Nonetheless, data quality issues are nearly omnipresent. The reasons for poor quality cannot simply be blamed on software issues or insufficiently implemented business processes. Based on our experiences the main reason is that data quality shows the strong tendency to converge down to a level that is inherent to the existing applications. As soon as applications and data are used for other than the established tasks they were originally designed for, problems arise. In this paper we extend and evaluate an approach to measure the accuracy dimension of data quality based on association rules. The rules are used to build a model that is intended to capture normality. Then, this model is employed to divide the database records into three subsets: “potentially incorrect”, “no decision”, and “probably correct”. We thoroughly evaluate the approach on data from our automotive domain. The results it achieves in identifying incorrect data entries are very promising. In the described setting, for the first time ever it was possible to highlight a significant number of incorrect data records that otherwise disappear in the millions of correct records. This ability enables domain experts to understand what is going wrong and how to improve data quality. Moreover, our approach is a first step towards automatically quantifying the overall accuracy of a yet unseen dataset.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Reliability Measures Measurement under Rule-Based Fuzzy Logic Technique

In reliability theory, the reliability measures contend the very important and depreciative role for any system analysis. Measurement of reliability measures is not easy due to ambiguity and vagueness which exist within reliability parameters. It is also very difficult to incorporate a large amount of uncertainty in well-established methodologies and techniques. However, fuzzy logic provides an...

متن کامل

S3PSO: Students’ Performance Prediction Based on Particle Swarm Optimization

Nowadays, new methods are required to take advantage of the rich and extensive gold mine of data given the vast content of data particularly created by educational systems. Data mining algorithms have been used in educational systems especially e-learning systems due to the broad usage of these systems. Providing a model to predict final student results in educational course is a reason for usi...

متن کامل

A rule-based evaluation of ladder logic diagram and timed petri nets for programmable logic controllers

This paper describes an evaluation through a case study by measuring a rule-based approach, which proposed for ladder logic diagrams and Petri nets. In the beginning, programmable logic controllers were widely designed by ladder logic diagrams. When complexity and functionality of manufacturing systems increases, developing their software is becoming more difficult. Thus, Petri nets as a high l...

متن کامل

تعیین دقت سونوگرافی و قانون نیگل در تخمین زمان زایمان

The Accuracy Determination of the Naegele’s Rule and Sonography for Estimating the Delivery Date R. Dehghani Firouzabadi MD , T. Botorabi , N. Tayebi GP Received: 23/09/06 Sent for Revision: 27/02/07 Received Revised Manuscript: 09/07/07 Accepted: 08/09/07 Background and Objective: Estimation of the gestational age (G.A) and the estimated date of confinement (EDC) are of paramount important fac...

متن کامل

A hybrid BSC-DEMATEL- FIS approach for performance measurement in Food Industry

Organizational performance is a complex issue given that performance is a multifaceted phenomenon whose components may have distinct managerial priorities and may even be mutually inconsistent. Recently, the balanced scorecard approach (BSC), as an effective multi-criteria evaluation concept received much attention in organizational performance measurement. Although the BSC conceptual framework...

متن کامل

Rule-based of Monetary Policy in Iran Inspired by McCallum Rule

Economists have reached a consensus that an independent central bank could improve its policy efficiency by following a monetary policy rule. One of the important rules is McCallum rule where that requires central banks to target the growth rate of nominal GDP using the monetary base as its instrument. One of the features of the McCallum rule uses the monetary base rather than the interest rate...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2007